Smart Contracts Vulnerability Detection Using Machine Learning and Large Language Models

Authors: Rehana Qudsiya, O. B. V. Ramanaiah

DOI Link: https://doi.org/10.22214/ijraset.2026.84002

Abstract

As blockchain technology and smart contracts gain widespread adoption, ensuring their security is essential to prevent financial and operational risks. Detecting vulnerabilities in smart contracts using automated techniques provides a reliable and scalable solution. This study utilizes the Smart Contract Vulnerabilities Dataset from Kaggle, containing annotated smart contracts with labeled vulnerabilities. Preprocessing includes tokenization and exploratory data analysis to extract meaningful textual patterns. Deep learning models such as LSTM and BERT are trained and evaluated using accuracy, precision, recall, and F1-score. To further improve detection performance, BERT embeddings are combined with BiLSTM and CNN + LSTM architectures. A Flask-based user interface enables real-time vulnerability prediction. Experimental results show that the CNN + LSTM model outperforms all other models, achieving 95 percent accuracy and demonstrating strong capability in identifying smart contract vulnerabilities.

Introduction

The paper focuses on improving the security of blockchain-based smart contracts by developing an intelligent vulnerability detection framework using deep learning and large language model (LLM) techniques. Although smart contracts enable secure, transparent, and automated transactions across applications such as decentralized finance (DeFi), healthcare, and supply chain management, coding errors and logical flaws can introduce serious vulnerabilities that lead to financial losses. Traditional static analysis and rule-based tools are limited because they rely on predefined patterns and cannot effectively detect complex or previously unseen vulnerabilities.

To address these limitations, the study investigates the integration of BERT-based contextual embeddings with deep learning architectures to improve the accuracy, scalability, and interpretability of vulnerability detection. The proposed system uses a labeled dataset of 2,217 Solidity smart contracts containing four vulnerability classes: Reentrancy, Integer Overflow/Underflow, Timestamp Dependency, and Dangerous Delegatecall.

The methodology begins with preprocessing the Solidity source code by cleaning comments, normalizing code, and encoding vulnerability labels. The cleaned code is tokenized using the BERT tokenizer, which generates contextual embeddings representing the semantic relationships within the source code. The dataset is split into training (80%) and testing (20%) sets, and four models are developed and compared: LSTM, BERT, BERT + BiLSTM, and LSTM + CNN. Their performance is evaluated using Accuracy, Precision, Recall, and F1-Score.

The literature review highlights the evolution of smart contract vulnerability detection from traditional static analysis to deep learning and transformer-based approaches. While previous studies improved detection accuracy, challenges remain, including limited multi-class detection capability, class imbalance, high computational costs, and insufficient interpretability. This research addresses these gaps by combining contextual embeddings with sequential and convolutional neural networks to achieve more comprehensive vulnerability detection.

Experimental results demonstrate that the LSTM + CNN hybrid model achieved the best performance, with an accuracy of 95.05%, outperforming standalone LSTM (93.69%), BERT (80.63%), and BERT + BiLSTM (82.43%). The superior performance of LSTM + CNN is attributed to its ability to capture both sequential dependencies and local structural patterns in smart contract code.

Conclusion

This paper presented a smart contract vulnerability detection framework based on deep learning and transformer-based models for identifying Reentrancy, Integer Overflow/Underflow, Timestamp Dependency, and Dangerous Delegatecall vulnerabilities. The proposed approach included data preprocessing, BERT-based tokenization, feature extraction, and vulnerability classification. Experimental results demonstrated that all evaluated models were capable of detecting vulnerability patterns in smart contract code. Among them, the LSTM+CNN model achieved the best performance with an accuracy of 95.05%, indicating its effectiveness in capturing both sequential and structural features of smart contracts. Future work will focus on extending the framework to support additional vulnerability types and larger smart contract datasets. The integration of advanced transformer architectures and code-specific language models may further improve detection accuracy. Additionally, techniques such as data augmentation and class balancing can be explored to enhance the classification performance of minority vulnerability classes.

References

[1] Yuan, H., Yu, L., Huang, Z., Zhang, J., Lu, J., Cheng, S., ... & Zuo, C. (2025). Mos: Towards effective smart contract vulnerability detection through mixture-of-experts tuning of large language models. arXiv preprint arXiv:2504.12234. [2] Yu, L., Huang, Z., Yuan, H., Cheng, S., Yang, L., Zhang, F., ... & Zuo, C. (2025). Smart-LLaMA-DPO: Reinforced Large Language Model for Explainable Smart Contract Vulnerability Detection. Proceedings of the ACM on Software Engineering, 2(ISSTA), 182-205. [3] Luo, Y., Xu, W., Andersson, K., Hossain, M. S., & Xu, D. (2024, August). Fellmvp: An ensemble llm framework for classifying smart contract vulnerabilities. In 2024 IEEE International Conference on Blockchain (Blockchain) (pp. 89-96). IEEE. [4] He, F., Li, F., & Liang, P. (2024). Enhancing smart contract security: Leveraging pre?trained language models for advanced vulnerability detection. IET blockchain, 4, 543-554. [5] Yu, L., Chen, S., Yuan, H., Wang, P., Huang, Z., Zhang, J., ... & Ma, J. (2024). Smart-LLaMA: two-stage post-training of large language models for smart contract vulnerability detection and explanation. arXiv preprint arXiv:2411.06221. [6] Kiani, R., & Sheng, V. S. (2024). Ethereum smart contract vulnerability detection and machine learning-driven solutions: A systematic literature review. Electronics, 13(12), 2295. https://www.mdpi.com/2079-9292/13/12/2295 [7] Kim, J., Lee, S., & Kim, H. (2024). Robust vulnerability detection in solidity-based ethereum smart contracts using fine-tuned transformer encoder models. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3482389 [8] Boi, B., Esposito, C., & Lee, S. (2024). Smart contract vulnerability detection: The role of large language model (LLM). ACM SIGAPP Applied Computing Review, 24(2), 19–29. https://doi.org/10.1145/3687251.3687253 [9] Ma, W., Wu, D., Sun, Y., Wang, T., Liu, S., Zhang, J., Xue, Y., & Liu, Y. (2024). Combining fine-tuning and LLM-based agents for intuitive smart contract auditing with justifications. arXiv preprint arXiv: 2403.16073. https://arxiv.org/abs/2403.16073 [10] Prifti, L., Çiço, B., & Karras, D. A. (2024). Smart contract vulnerability detection using deep learning algorithms on EVM bytecode. In 2024 13th Mediterranean Conference on Embedded Computing (MECO). IEEE. https://doi.org/10.1109/MECO62516.2024.10577852 [11] S. Sayeed, H. Marco-Gisbert and T. Caira, \"Smart Contract: Attacks and Protections,\" IEEE Access, vol. 8, pp. 24416–24427, 2020, doi: 10.1109/ACCESS.2020.2970495. [12] P. Qian, Z. Liu, Q. He, R. Zimmermann and X. Wang, \"Towards Automated Reentrancy Detection for Smart Contracts Based on Sequential Models,\" IEEE Access, vol. 8, pp. 19685–19695, 2020, doi: 10.1109/ACCESS.2020.2969429. [13] L. Prifti, B. Cico and D. Karras, \"Smart Contract Vulnerability Detection Using Deep Learning Algorithms on EVM Bytecode,\" MECO, 2024, pp. 1–7, doi: 10.1109/MECO62516.2024.10577852. [14] J. Fan, Y. He, and H. Wu, \"Small Sample Smart Contract Vulnerability Detection Based on Multi-Layer Feature Fusion,\" Complex & Intelligent Systems, vol. 11, Art. no. 198, 2025, doi: 10.1007/s40747-025-01782-3. [15] S. Hu et al., \"Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives,\" TPS-ISA, 2023, pp. 297–306, doi: 10.1109/TPS-ISA58951.2023.00044. [16] R.-Y. Choi et al., \"Smart Contract Vulnerability Detection Using Large Language Models and Graph Structural Analysis,\" Computers, Materials & Continua, vol. 83, no. 1, pp. 785–801, 2025, doi: 10.32604/cmc.2025.061185. [17] F. He, F. Li, and P. Liang, \"Enhancing Smart Contract Security: Leveraging Pre-Trained Language Models for Advanced Vulnerability Detection,\" IET Blockchain, vol. 4, pp. 543–554, 2024, doi: 10.1049/blc2.12072. [18] S. M. M. Hossain, A. Altarawneh and J. Roberts, \"Leveraging Large Language Models and Machine Learning for Smart Contract Vulnerability Detection,\" CCWC, 2025, pp. 577–583, doi: 10.1109/CCWC62904.2025.10903869. [19] O. Zaazaa and H. El Bakkali, \"SmartLLMSentry: A Comprehensive LLM Based Smart Contract Vulnerability Detection Framework,\" Journal of Metaverse, vol. 4, pp. 126–137, 2024, doi: 10.57019/jmv.1489060.

Copyright

Copyright © 2026 Rehana Qudsiya, O. B. V. Ramanaiah. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET84002

Publish Date : 2026-06-27

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here